6 research outputs found
What the Vec? Towards Probabilistically Grounded Embeddings
Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding
algorithms. Their embeddings are widely used and perform well on a variety of
natural language processing tasks. Moreover, W2V has recently been adopted in
the field of graph embedding, where it underpins several leading algorithms.
However, despite their ubiquity and relatively simple model architecture, a
theoretical understanding of what the embedding parameters of W2V and GloVe
learn and why that is useful in downstream tasks has been lacking. We show that
different interactions between PMI vectors reflect semantic word relationships,
such as similarity and paraphrasing, that are encoded in low dimensional word
embeddings under a suitable projection, theoretically explaining why embeddings
of W2V and GloVe work. As a consequence, we also reveal an interesting
mathematical interconnection between the considered semantic relationships
themselves.Comment: Advances in Neural Information Processing, 201
TuckER: Tensor Factorization for Knowledge Graph Completion
Knowledge graphs are structured representations of real world facts. However,
they typically contain only a small subset of all possible facts. Link
prediction is a task of inferring missing facts based on existing ones. We
propose TuckER, a relatively straightforward but powerful linear model based on
Tucker decomposition of the binary tensor representation of knowledge graph
triples. TuckER outperforms previous state-of-the-art models across standard
link prediction datasets, acting as a strong baseline for more elaborate
models. We show that TuckER is a fully expressive model, derive sufficient
bounds on its embedding dimensionalities and demonstrate that several
previously introduced linear models can be viewed as special cases of TuckER
Interpreting Knowlege Graph Relation Representation From Word Embeddings
Many models learn representations of knowledge graph data by exploiting its
low-rank latent structure, encoding known relations between entities and
enabling unknown facts to be inferred. To predict whether a relation holds
between entities, embeddings are typically compared in the latent space
following a relation-specific mapping. Whilst their predictive performance has
steadily improved, how such models capture the underlying latent structure of
semantic information remains unexplained. Building on recent theoretical
understanding of word embeddings, we categorise knowledge graph relations into
three types and for each derive explicit requirements of their representations.
We show that empirical properties of relation representations and the relative
performance of leading knowledge graph representation methods are justified by
our analysis
Hypernetwork Knowledge Graph Embeddings
Knowledge graphs are graphical representations of large databases of facts,
which typically suffer from incompleteness. Inferring missing relations (links)
between entities (nodes) is the task of link prediction. A recent
state-of-the-art approach to link prediction, ConvE, implements a convolutional
neural network to extract features from concatenated subject and relation
vectors. Whilst results are impressive, the method is unintuitive and poorly
understood. We propose a hypernetwork architecture that generates simplified
relation-specific convolutional filters that (i) outperforms ConvE and all
previous approaches across standard datasets; and (ii) can be framed as tensor
factorization and thus set within a well established family of factorization
models for link prediction. We thus demonstrate that convolution simply offers
a convenient computational means of introducing sparsity and parameter tying to
find an effective trade-off between non-linear expressiveness and the number of
parameters to learn
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competitive accuracy to manually-tuned prompts across a wide range of tasks. While finetuning LMs does introduce new parameters for each downstream task, we show that this memory overhead can be substantially reduced-finetuning only the bias terms can achieve comparable or better accuracy than standard finetuning while only updating 0.1% of the parameters. All in all, we recommend finetuning LMs for few-shot learning as it is more accurate, has relatively stable performance across different prompts, and can be made nearly as efficient as using frozen LMs